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Abstract 

Phase transitions in combinatorial problems have re- 
cently been shown Q to be useful in locating "hard" 
instances of combinatorial problems. The connection 
between computational complexity and the existence 
of phase transitions has been addressed in Statistical 
Mechanics (2) and Artificial Intelligence but not 
studied rigorously. 

We take a first step in this direction by investigat- 
ing the existence of sharp thresholds for the class of 
generalized satisfiability problems, defined by Schae- 
fer Q. In the case when all constraints have a 
special clausal form we completely characterize the 
generalized satisfiability problems that have a sharp 
threshold. While NP-completcness does not imply 
the sharpness of the threshold, our result suggests 
that the class of counterexamples is rather limited, 
as all such counterexamples can be predicted, with 
constant success probability by a single procedure. 

1 Introduction 

Which combinatorial problems have "hard" in- 
stances? Computational Complexity is the main the- 
ory that attempts to provide answers to this ques- 
tion. But it is not the only one. While the concept 
of NP-complete problem, as a paradigm for "problem 

*an extended version will be available shortly as 111. 



with hard instances" , has permeated a wide range of 
fields, from Computational Biology to Economics, it 
is not usually considered extremely relevant by prac- 
titioners. This happens because NP-completeness is 
an overly pessimistic, worst-case, concept, and in 
fact if we're not really careful about the random 
model, "most" instances of many NP-complete prob- 
lems turn out to be "easy" . 

Much insight in locating the regions "where the 
really hard instances are" has come from an analogy 
with Statistical Mechanics, in the context of phase 
transitions in combinatorial problems. Recent studies 
||] have shown that a certain type of phase transitions 
(called first- order phase transitions) is responsible for 
the exponential slowdown of many natural algorithms 
when run on instances at the transition point. 

A natural, and early stated question is whether 
there exists any connection between computational 
complexity and the existence of a phase transition. 
Obtaining an answer to this question is further com- 
plicated by the fact that the physicists' and computer 
scientists' concepts of phase transitions are different: 
the former pertains to combinatorial optimization, 
and is called order- disorder phase transition, while 
the latter applies to decision problems and is called 
threshold property, more specifically a restricted form 
of threshold property called sharp threshold^. It is 
this type of phase transitions we're primarily inter- 
ested in this paper. 

1 sec definition M. 



The above question has been asked for both types 
of phase transitions: Fu || argued that there should 
be no connection between worst-case computational 
complexity and the existence of an order-disorder 
phase transition, by showing that an NP-complete 
problem, number partition, has no order-disorder 
phase transition (however see || that argues that 
number partition has an order-disorder phase tran- 
sition under a different random model). The case 
of decision problems is even more spectacular: in a 
paper that proved very influential in the Artificial In- 
telligence community ||, Cheeseman, Kanefsky and 
Taylor conjectured that roughly the difference be- 
tween tractable and intractable problems, specifically 
between problems in P and NP-complete problems is 
that: 

1. NP-complete problems have a phase transition 
(sharp threshold) with respect to "some" order 
parameter. 

2. in contrast, problems in P lack such a threshold. 

Their conjecture was at best wishful thinking. 
First, they did not make it precise enough, by speci- 
fying what an order parameter is. Second, they had 
no evidence supporting such a radical statement. In 
fact, examples of problems in P that do have a sharp 
threshold with respect to a "reasonable" order pa- 
rameter had already long been known (for instance 
the probability that a random graph has a connected 
component of at least, say, n 3 / 4 vertices, by the clas- 
sical results of Erdos and Renyi ) . 

A natural question is whether there is any con- 
nection at all between computational complexity and 
the existence of a sharp threshold at least for prob- 
lems that possess some "canonical" order parame- 
ter. One restriction that entails the existence of a 
canonical order parameter is the very one which was 
used in defining threshold properties: monotonicity 
||. Clearly the above-mentioned example shatters 
the hope of obtaining a version of (2) even for mono- 
tonic problems. A quick argument shows that even 
(I) should fail: in any polynomial degree there exist 
both monotone problems that have (or do not have) 
sharp thresholds. The intuitive reason is that the ex- 
istence of a sharp threshold is a statistical property, 



that is not affected by modifying a given problem 
on a set of instances that has zero measure. On the 
other hand worst-case complexity is sensitive to such 
changes. The result is formally stated as Proposi- 
in the Appendix. 



5.1 



tion 

Given the above argument it would seem that the 
question has been answered, and that no whatsoever 
connection exists between the two concepts. How- 
ever the examples constructed in Proposition 5.1 arc 



rather artificial, and the overall proof is reminiscent 
of Ladner's || result on the structure of polyno- 
mial degrees: we can construct a set of the desired 
complexity by starting with a certain base set and 
"tuning-up" its worst-case complexity on a set that 
is "small enough" so that this does not affect the 
other desirable property of the base set, having a 
sharp/coarse threshold. The question still remains 
whether the result remains true if we only consider 
problems with a certain "natural" structure. After 
all, this is true in the case of computational com- 
plexity: Schaefer ||), showed that, when restricted 
to the class of generalized satisfiability problems, the 
rich structure of polynomial m-degrees derived from 
Ladner's results simplifies to only two degrees, P and 
the degree of NP-complete problems, and obtained a 
full characterization of such problems. 

Definition 1 Let S = {R 1} R p }, R t C {0, l} r *, 
be a finite set of relations. An S'-formula in n vari- 
ables is a finite conjunction of clauses, i.e. expres- 
sions of the type Rj(xj^i, . . . ,Xj jr A, with the variables 
Xj chosen from a fixed set of n variables X\, . . . , x n . 
SAT(S) is the problem of deciding whether an arbi- 
trary S -formula has a satisfying assignment x\ . . . x n 
(one that makes each clause true). 

A pleasant feature of Schaefer's framework is that 
every problem SAT(S) is monotonic. Clearly, an 
analog of (2) fails in this case as well: the den- 
sity result Proposition |5.l| is still true for one of the 
two polynomial degrees, P, as 2-SAT has a sharp 
threshold (To), while e.g. at-most-2-HORN-SAT has 
a coarse threshold On the other hand there ex- 
ists some evidence that some notion of computational 
intractability implies the existence of a sharp thresh- 
old: in his celebrated result on sharp thresholds for 



3-SAT Friedgut gives an example of a NP-complete 
graph problem having a coarse threshold: the prop- 
erty of containing either a triangle or a "large" clique. 
From a probabilistic standpoint the second part is 
"not important" . Moreover, his characterization the- 
orem implies that any graph theoretic property that 
fails to have a sharp threshold can be well "approxi- 
mated" by a tractable property, the property of con- 
taining a copy of a fixed graph. Finally, there is 
an altogether different reasons for a rigorous study 
of sharp thresholds in satisfiability problems: in this 
case the notion of a first-order phase transition (that, 
as mentioned does have significant algorithmic impli- 
cations) has a nice combinatorial interpretation, as a 
"sudden jump" in the relative size of a combinatorial 
parameter called backbone (see e.g. |l2| for definition 
and discussion), ft is easy to show (this is an argu- 
ment implicitly made in that will be presented 
in the full version of the paper) that the discontinu- 
ity of the backbone implies the existence of a sharp 
threshold. Therefore studying problems with sharp 
thresholds is a useful first step towards identifying 
all satisfiability problems having a first order phase 
transition. 

It is, perhaps, tempting to conjecture that, when 
restricted to Schaefer's framework an analogue of (1) 
holds: 

Hypothesis 1 Every generalized satisfiability prob- 
lem SAT(S) that Schaefer's dichotomy theorem FJ/ 
identifies as NP-complete has a sharp threshold. 

We further restrict our framework to the case when 
all constraints in S have a special, clausal form. In 
this case we obtain a complete characterization of all 
sets of constraints S for which SAT(S) has a sharp 
threshold. In a preliminary version of this paper we 
claimed that for clausal constraints NP-complctcncss 
implies the existence of a sharp threshold. Unfortu- 
nately this is not true, as the revised version of our 
result shows. On the other hand, as displayed by 
Corollary [I], the class of counterexamples is rather 
limited: they are those NP-complete problems for 
which satisfiability of a random instance $ can be 
predicted with significant success by a very trivial 
heuristic: if neither 0" or 1™ are satisfying assign- 
ments then return "unsatisfiable" . So the lack of a 



sharp threshold does have algorithmic implications, 
albeit in a probabilistic sense. 

2 Preliminaries 

We will work in the context of NP-decision problems, 
a standard concept in Complexity Theory. For a pre- 
cise definition see, e.g., (f|. 

Definition 2 The NP-decision problem P is mono- 
tonically decreasing if for every instance x of P and 
every witness y for x, y is a witness for every in- 
stance z obtained by turning some bits of x from I 
to 0. Monotonically increasing problems are defined 
similarly. 

The three main random model from random graph 
theory, the so-called constant probability model, the 
counting and multiset model extend directly to NP- 
decision problems, and are interchangeable under 
quite liberal conditions. For technical convenience we 
will use the constant probability model when proving 
sharp thresholds and the multiset models when deal- 
ing with coarse thresholds. The following is a brief 
review. The multiset model, denoted Q(n, m), and 
which has two integer parameters n, m. A random 
sample from Cl(n, m) is obtained by starting with the 
string 2 = 0™, choosing (uniformly at random and 
with repetition) m bits of z, and flipping these bits to 
one. When n is known, we use f/, m (A) to refer to the 
measure of a set A under this random model. The 
constant probability model denoted fl p (n) has two 
parameters, an integer n and a real number p G [0, 1]. 
A random sample from Q p (n) is obtained by starting 
with the string z = 0™ and then flipping the bits of z 
to one independently with probability p. 

Definition 3 Let P be any monotonically decreasing 
decision problem under the constant probability model 
fip(n). A function 9 is a threshold function for P if 
for every function m, defined on the set of admissible 
instances and taking real values, we have 

1. if p(n) = o(6(n)) then lim n _ ) . 0o Pr a;e n j> ( n )[a; € 
P] = I, and 



2. if p(n) = oj{6{n)) then lim^oo Pr xe n p ( n )[x G 
P] = 0. 

P has a sharp threshold if in addition the fol- 
lowing property holds: 

3. For every e > define the functions 
Pe(n),p 1 / 2 (n),pi- e (n) by 

P^xen Pe (n)[x G P] = e, 

Pr aef2pi/2Cn) [xGP] = l/2}, 

P T xen Pl _ c (n)[x G P] = 1-e 
Then we have 

pi- t (n) -p e (n) 

hrrin-,00 = °- 

Pi/2{n) 

If, on the other hand, for some e > the amount 
pi_,(ra)-p,(n) ^ fo oun( i e d away from as n — ► oo, 

P has a coarse threshold. These two cases are not 
exhaustive as the above quantity could in principle 
oscillate with n. Nevertheless they are so for most 
"natural" problems. 

Let / : N -> R. Define QEMPTY(f) to be the 
probability that the following queuing chain: 

r Q = i, 

\ Qi+l = Qi — 1 + Hi+1- 

(where the S t 's are independent Poisson vari- 
ables with parameter /(£)) ever remains without cus- 
tomers. 

Definition 4 Let (o, 6) G N x N \ (0,0). De/me 
C 0j fc = afi V . . . V x a V x a+ i V ... V x a+ b. 5mc/i a 
relation is called clausal constraint. 

For a set S as in definition [j] let k be the maxi- 
mum arity of a relation in S. To avoid trivial cases, 
we assume that k > 2. For i — l,k let j>i be 1 if 
clause afi V . . . V V G S 1 and otherwise, and 
let rii be 1 if clause x\ V . . . V Xi G 5 and oth- 
erwise. Define polynomials Pi(c) = Ylj>i U— J ' 



and Qi(c) = X) 3 ->i ' "j- Lct s k = kp k + n kl 
N s = (I) ■ 5k, and a = m/N s . Finally, let 

ciq = max{0} U {a : C a ,o G S}, 

a>i = max{0} U {a : C a , h G S,b> 1}. 

bo and 6>i are defined similarly with respect to the 
second component. 

3 Main result 

Recall that a relation is called 0-valid (1-valid) if it 
is satisfied by the assignment "all zeros" ( "all ones" ) 
and Horn (negated Horn) if it is equivalent to a Horn 
(negated Horn) CNF-formula. When S is Horn the 
number of clauses in S over n variables is iVg(l + 
o(l)). For a property T we will use " S is T" as a 
substitute for "every relation in S is T" . 
Our main result is 

Theorem 3.1 Let S be a finite set of clausal con- 
straints. 

a. If S is 0-valid or S is 1-valid then the decision 
problem SAT(S) is trivial. 

b. If S is (Horn U 0-valid) or S is (negated Horn 
U 1-valid) then SAT(S) has a coarse threshold. 

c. Suppose cases a. and b. do not apply. If 

(a>i < a < b ) V (6>i < b < a ) V 

(oo = b = min{a>i, 6>i}) 

then SAT(S) has a sharp threshold, otherwise 
SAT(S) has a coarse threshold. 

For reasons of space we can do little but present a 
rather sketchy outline of the proof of Theorem 3.1. 
A full version will be given in |l|. The following 
corollary (of the preceding result and its proof) sum- 
marizes the intuition that all NP-complete problems 
with coarse thresholds are "rather trivial" . 

Corollary 1 Suppose S is a finite set of clausal con- 
straints. Then SAT(S) has a coarse threshold exactly 
when at least one of the following (non-exclusive) 
conditions applies. 



Program PUR($) : 

if $ (contains no positive unit clause) 

return TRUE 
else 

choose such a positive unit clause x 
if ($ contains x as a clause) 

return FALSE 
else 

let $' be the formula 
obtained by setting i to 1 
return PUR($') 

Figure 1: Algorithm PUR 

1. S is Horn. 

2. S is negated Horn. 

3. SAT(S) is NP-complete and has the same 
threshold function as the property "0™ satisfies 

4. SAT(S) is NP-complete and has the same 
threshold function as the property "1" satisfies 

Indeed, in the cases 3 and 4 there exists a single 
trivial algorithm, that declares the formula unsatisfi- 
able if it is not satisfied by any of the two assignments 
n and 1™, and which is correct with a constant prob- 
ability e over the whole range of the parameter p (in 
the constant probability model). 

Observation 1 In the general case there are other 
(non-clausal) examples of satisfiability problems with 
a coarse threshold. Let R(x, y) be the relation "x ^ 
y" . Then SAT({R}) is essentially the 2-coloring 
problem, which has a coarse threshold. 

4 Proof sketch 

b. This part of the proof is constructive. When $ 
is Horn we explicitly determine the probability 
that a random formula is satisfiable, and then 



use it to argue that the corresponding (Horn U 
0- valid) cases also have a coarse threshold. The 
analysis of the Horn cases is similar to the one 
when S consists of all Horn clauses of length at 
most k, that was settled in pl] |, and is accom- 
plished by analyzing PUR, a natural implemen- 
tation of positive unit resolution, which is com- 
plete for Horn satisfiability. 

We regard PUR as working in stages, indexed 
by the number of variables still left unassigned; 
thus, the stage number decreases as PUR moves 
on. We say that formula $ survives Stage t if 
PUR on input $ does not halt at Stage t or 
earlier. Let <&i be the formula at the beginning 
of stage i, and let Ni denote the number of its 
clauses. We will also denote by Pi : t{Ni :t ), the 
number of clauses of <&t of size i and contain- 
ing one (no) positive literal. Define <&f t ($f t ) to 
be the subformula of $t containing the clauses 
counted by Pi,t(N^t). The analysis proceeds 
by showing that we can characterize the evolu- 
tion of PUR on a random formula by a Markov 
chain, and is based on the following "Uniformity 
Lemma" from jllj , valid in our context as well: 

Lemma 4.1 Suppose that $ survives up to 
stage t. Then, conditional on the val- 
ues (P lit ,N lit ,...,P kit ,Nk,t), the clauses in 
$f t , $>^ t , . • • , &k t> ^kt are chosen uniformly at 
random and are independent. Also, conditional 
on the fact that survives stage t as well, the 
following recurrences hold: 

\ N ltt -i = Ni, t + Af 2it , 
and, for i — 2, k, 

J Pi,t-x = Pi.t - A& t - A^_ 1)M + A^ +1) ( , 
where 

( A£ )t = B(P M -l,l/t), 

A«-i)i, t = -B(^,*»(i-l)/*). 
< A£. t = I?(P M -A£_ 1)M ,l/*), 

*u-i)i,t = B(N itt ,i/t), 
A p - A N - 



The main intuition for the proof is that with 
high probability the binomial expressions in the 
previous formulas are close to their expected val- 
ues. The proof of this very intuitive statement 
is conceptually simple, but technically somewhat 
involved, and mirrors the proof in JO). So all 
it remains is to characterize the mean values of 
Pi : t, N i)t - We only outline the main steps of 
this computation in the sequel, assuming that 
the above mentioned concentration results hold. 
Define x itt ,y itt by 

f E[P ut \ = i ■ ([) - Xitt , 
\ E[N i<t } = Q-y i<t . 

Then it is easy to see that sequences Xij,Ui,t, 
i > 2 verify the recurrences: 

f X%,t-1 = Xi,t + x i+ x jt , 

1 yi,t-i = y%,t + Vi+i,t- 

Define the vector sequence (Z t )t>o 6 R fc_1 by 
Z t+ i = A - Z t , with A = (aij), 

a- = { if 3 =z + 1 ' 
iJ [ 0, otherwise. 

It is easy to see that both sequences (xi t t)t and 
(jji,t)t satisfy the same recurrence as Z t . A 
simple computation shows that A k j — 
(where, for t < 0, (*) = 0). Therefore Zj )t = 
Z)j>i (jij^o- Since Xi,„ = a • Pi ■ (1 + o(l)), 
we have that for every constant c > 0, Xi^ n - C = 
a ■ Pi(c) ■ (1 + o(l)) for every i > 2. In the same 
way y,, n -c = a ■ Qi(c) ■ (1 + o(l)). 

Computing xi jtl yi,t (or equivalently Pi tt ,N^ t ) 
needs some care, and this is where several forms 
of the threshold result are obtained. 

Case 1: > 2, p jl = n h = 1. The 

following is the result in this case: 

Theorem 4.2 Let c > 0, and let m = 

c ■ n k ~ 1 . Then the probability that PUR ac- 
cepts $ is equal to QEMPTY(c-j±-P 2 (j)). 

The proof of the theorem goes along the 
following lines: 



1. as long as P\j is "small" (sublinear) 
Pis-i ~ Pi,t - 1 + Po(t ■ X2,t). This is 
particularly true in the first 9(1) stages, 
when Pi t can be approximated by a 
queue with arrival distribution Po(c ■ 

■ P 2 (n — t)). This explains the form 
of the limit probability. 

2. Also, in the first 9(1) stages Pit, iVi,t 
are "small" (approximately constant), 
so that w.h.p. PUR does not reject. 

3. The probability that PUR accepts after 
the first 6(1) stages is small, since, after 
these stages P^t will be large enough to 
make a decrement to unlikely. 

4. At the stages c = n — 6(^/n), P\. t ,Ni^ 
are large enough to guarantee the exis- 
tence, with nonnegligible probability of 
a variable that appears both as a posi- 
tive and a negative unit clause. 

Let S be now (Horn U 0- valid), Sh = 
SDHORN, let $ be a random formula and 
$ ff be its "Horn part". That SAT(S) has 
the same (coarse) threshold as SAT(Sh) 
follows easily from the following set of in- 
equalities: 

Pr[<& has no positive unit clauses ] < 

Pr[$ e SAT] < Pr[$ H e SAT]. 

Case 2: Bji > 2, p h = 1 but Vj > 2 : n 3 = 0. 
Then the following holds: 

Theorem 4.3 Let c > 0, and let m — 

c ■ n k ~ x . Then the probability that PUR ac- 
cepts $ is equal to 

_„ fe! „ fe! fc' 

e c '^-+(l-e c s «) -Q EMPTY (c- — -P 2 (j)). 

Ok 

The outline is quite similar to the one of the 
previous case, with a couple of differences. 

1. Now Ni t n o longer grows, but remains 
equal to Ni_ n for as long as the algo- 
rithm does not halt. There exist a non- 
negligible (and asymptotically equal to 



— c- fc! 

e **) probability that iVi )n = 0. In 
this case 11 ... 11 is a satisfying assign- 
ment. 

2. In the opposite case the structure of 
the proof (and conclusion) is similar to 
the one from the Case 1, except that, 
since A?i jt no longer grows, we have to 
look up to 6{n) stages to be sure that 
the algorithm has a nonncgligiblc prob- 
ability to reject. In this case the term 
Aj^ t can no longer be taken to be ap- 
proximately zero. One can, however, 
get by, by noticing that, at those stages 
where P\ it is 0(n), the probability that 
there exists a positive unit clause op- 
posite to the negative unit clause guar- 
anteed by the condition N\ tn > is 
approximately constant. Iterating this 
over a small but unbounded number of 
steps allows us to conclude that for ev- 
ery e > with probability 1 — o(l) the 
formula becomes unsatisfiable in one of 
the first e ■ n stages. Taking e small 
enough so that P\ : t is still nonzero af- 
ter e • n stages (if PUR hasn't already 
stopped by this time) allows us to de- 
rive the same form of the limit proba- 
bility as in case 1. 

The analysis of the (Horn U 0-valid) case is 
similar to the previous one. 

Case 3: 3j 2 > 2 n h = 1 but V j > 2 : pj = 0. 
In this case the threshold result is 

Theorem 4.4 Let c > 0, and let m = c ■ 

^fe-i+fcTT. Then the probability that PUR 
accepts $ is equal to 

e-c k+K W k + (1). 

The main steps of the analysis are: 

1. In this case P\ it is decreasing, but the 
special form of the threshold makes 
sure that A^ t can be neglected, so 
Pi,t-i ~ Pl.t - 1, and P M ~ P 1>n - 
(n-t). 



2. On the other hand Ni tt increases and 
approximately satisfies the following 
recurrence iVi, t -i ~ N lit + (t - 1) • y 2 .u 
where y 2 ,t can be computed as outlined 
before. 

3. The probability that the positive literal 
chosen at stage t occurs both in posi- 
tive and negative unit form is approxi- 

mately 1 — e t . 

4. The threshold interval is obtained 
when the probability that the algo- 
rithm rejects in the last 8(1) stages be- 
comes roughly constant (so that the 
events "PUR accepts" and "PUR re- 
jects" compete). 

5. A recursive computation yields the fi- 
nal form of the limit probability. 

An interesting thing happens when considering 
the corresponding (Horn U 0-valid) case: the 
threshold interval is no longer the one from the 
corresponding Horn case, but rather mirrors the 
one in Cases 1 and 2. The underlying reason is 
simple: the lower bound is the same as in Cases 
1 and 2, the probability that <!> contains no pos- 
itive unit clause. To show an upper bound less 
than one, consider applying PUR (which is no 
longer complete) to our formula. With some pos- 
itive probability PUR will exhaust all the posi- 
tive unit literals (including those created on the 
way) before accepting. Since S is not Horn, it 
contains a clause template with b > 2 positive 
literals. 

Such clauses will result, when the positive unit 
clauses are exhausted, into an at least linear 
number of clauses of the type Co,&- Together 
with the "all negative" clauses these will ensure 
that w.h.p. (at least for a big enough constant c) 
the remaining formula is unsatisfiable. Thus the 
probability that $ is satisfiable is less than 1 — 
Pr[PUR exhausts all its positive unit clauses] — 
o(l). The only case left uncovered by this ar- 
gument is when the only type of "all negative" 
clauses are the unit clauses, but in this case one 
can apply a similar reasoning by setting the vari- 



ables appearing in negative unit clauses too. 

c. The argument is based on Friedgut's proof 
of the fact that 3-SAT has a sharp threshold, and 
we assume familiarity with the concepts and the 
methods in this paper. He first shows a gen- 
eral result that roughly states that graph (and 
hypergraph) problems that have coarse thresh- 
olds have a simple approximation at the thresh- 
old point. Here is a general and cleaner version 
of this result from J. Bourgain's appendix: 

Proposition 4.5 Let A C {0, 1}™ be a mono- 
tone property, and assume say 

e < H P {A) < 1 - e 



for some p — o(l) and C > Then there is 
5 = 5(C) such that either 

(i p ({x € {0, l} n \x D x' £ A, \x'\ < IOC) > <S 

w 

or there exists x' g" A of size \x'\ < 10C such 
that the conditional probability 

H P (x £ A\x D x') > ^ + 5. (2) 

As a sanity check, let us see how this theorem 
applies to the three cases of HORN-SAT we have 
just analyzed. The set A is taken to be SAT(S). 

• In the first two cases condition [2] applies, 
and the "magical" formula x' is simply a 
fixed unit clause. 

• In the last case condition || applies. The 
"forbidden formula" x' consists of k differ- 
ent unit clauses xi, . . . ,Xk, together with 
the clause x\ V . . . V Xk- An unexpected 
outcome of the analysis is that the satisfi- 
ability probability of a random formula $ 
coincides within o(l) with the probability 
that $ contains no isomorphic copy of x' . 

2 such p and C exist, assuming that the sharp threshold 
condition for A fails with respect to e > 0. 



Suppose S is neither (Horn U 0-valid) nor 
(negated Horn U 1-valid) 

Then S contains the clauses C OQi o and Co,b and 
oo,^o > 2. Assume w.l.o.g. that bo < ao- Ac- 
cording to another theorem of Friedgut (that is 
redcrived by Bourgain as Corollary 3), there ex- 
ists 7 <E Q such that the value p from Proposi- 
tion^^ is f?(n 7 ). Therefore the expected number 
of copies of the clause Co,b in a random SAT(S) 
formula is (9(n 71 ), for some rational number 71. 
It is easy to see that 71 > 0. Indeed, suppose 
otherwise. Then the expected number of copies 
of Co,b in is o(l), so with probability 1 — o(l) 
$ contains no clauses consisting of positive liter- 
als only. Therefore with probability 1 — o(l) the 
assignment 0" satisfies which is a contradic- 
tion. 

Case 1: Suppose b>i < bo. 

In this case we want to show that SAT(S) has 
a sharp threshold. A first observation is that 
71 > 0. Indeed, suppose 71 = and consider 
the formula S obtained from $ in the following 
manner: delete from each clause of $ of length 
at least 60 (with probability 1 — o(l) all clauses of 
$ are like that) bo — 1 literals chosen as follows: 

• If the clause has at most bo — 1 positive lit- 
erals delete them all; then delete a number 
of random negative literals, so that in the 
end we delete bo — 1 literals. 

• Otherwise delete all but one of the bg posi- 
tive literals, chosen uniformly at random. 

It is easy to see that S £ SAT => $ £ SAT. E 
is a Horn formula, falling in the third category 
(since, by the assumption b\ < bo no positive re- 
maining clause has length greater than 1). The 
formula is not a uniform one (since clauses of the 
same length are not do not have the same prob- 
ability of occurrence). However it can be made 
so, while increasing the satisfaction probability, 
by keeping only a fraction of the clauses that oc- 
cur with probability higher than the minimum 
one among clauses of the same length. From b. 
Case 3 it follows that with probability 1 — o(l) 
S (therefore $) is satisfiable, contradiction. 



We are now in position to outline how to mimic 
Friedgut's argument to show a sharp threshold 
in our case. Friedgut deals directly with the 
monotone set A of k-DNF formulas that are tau- 
tologies, and first shows that, assuming that this 
set does not have a sharp threshold it is the al- 
ternative |^ that holds. This is evident for K- 
SAT, but not in our case. Fortunately, we can 
use some of his argument: assuming that the 
other alternative holds, the critical value would 
be p = 6{n- v l c ), de riving from an unsatisfiable 
formula F with v variables and c clauses. To 
give this threshold, F is also balanced, that is, 
has ratio clauses/variables higher than any of its 
induced subformulas. Since F is unsatisfiable it 
immediately follows that v < c. But this can- 
not happen, since a first moment method easily 
shows that in our case p = o(l/n). 

He then proceeds to show that for k-SAT there 
cannot exist a "magical" formula x' wit h the 
properties guaranteed by Proposition L5. The 



proof follows the following outline (the quotes 
below refer to statements in |Q) 

1. the nonexistence of a sharp threshold im- 
plies the existence of a small "magical" for- 
mula F, which is not itself a tautology, and 
which boosts the probability that a random 
formula <& is a tautology, if we condition on 
$ containing a fixed copy of F by a non- 
negligible (f2(l)) amount. 

2. the existence of such a formula implies 
that adding a constant number of random 
clauses of size 1 to a random formula also 
boosts the probability of obtaining a tau- 
tology by a positive amount. 

3. finally, a contradiction is obtained by show- 
ing that were the conclusion of the previ- 
ous step true, then adding instead an ar- 
bitrarily small (but unbounded) number of 
clauses of size k would also be enough to 
boost the probability of obtaining a tautol- 
ogy. But such a statement can be refuted 
directly (Lemma 5.6). 

The heart of Friedgut's proof is Step 3, a geomet- 



ric argument, Lemma 5.7 in his paper. This is 
where the special syntactical nature of k-SAT (or 
rather, dually, k-DNF-TAUTOLOGY) appears: 
according to Lemma 5.7, the probability that an 
arbitrary subset of the hypercube {0, 1}™ can be 
covered with a small (but nonconstant) number 
of hyperplanes of codimension k (corresponding 
to DNF-clauses of length exactly k) is asymp- 
totically no smaller than the probability that it 
can be covered with a constant number of hyper- 
planes of codimension 1, whose existence is im- 
plied by Proposition 4.5 via the process outlined 
in steps 1,2,3. The clausal structure of k—SAT is 
reflected by the correspondence between clauses 
of size k and hyperplanes of codimension k, and 
this correspondence will extend in our more gen- 
eral case. The argument in Lemma 5.7 is not spe- 
cific to k — SAT, but works in some other cases, 
if we replace, of course, hyperplanes of codimen- 
sion k by the corresponding type of hyperplanes 
and make sure that the geometric argument still 
works. For instance one can mimic the proof to 
show that SAT (So), where So = {C 0o ,o> Cb,6 } 
has a sharp threshold. A minor technical nui- 
sance is that now we need to consider two types 
of hyperplanes of codimension larger than one, 
corresponding to both types of clauses, but this 
does not influence the overall reasoning. 

The idea of our argument is now rather transpar- 
ent: the rest of the steps in Friedgut's argument 
extend more or less in a straightforward fashion, 
and it is only the analog of Lemma 5.7 where we 
need to see how the proof extends. In our case we 
have a "large" (non-constant) number of copies 
of Ca ,0)Co,6o m a random SAT(S) formula at 
the critical value of p. They are used to "cover a 
finite number of unit clauses" . But this property 
does not depend on the other types of clauses in 
S, as long as we can make sure that we have 
a non-constant number of copies of C a0i o, Co,5 
(this is where 71 > comes into play). 

These two types of clauses act as a "SAT (So)" 
core of the formula that is enough to ensure 
that a the geometric argument used to prove 
that SAT (So) has a sharp threshold holds for 



SAT(S) as well. The structure of the proof in 
this case is similar, at a very high level, with the 
one of Schaefcr's dichotomy theorem: in this lat- 
ter case the canonical problem is 3-SAT and NP- 
completeness follows from the ability to "simu- 
late" all clauses of length 3. For sharp/coarse 
thresholds, the canonical problem is SAT (So), 
and the existence of a sharp threshold follows 
from the ability to "simulate" both clauses in So. 
Case 2: Suppose a = 6 = 6>i < a>i. 

The ideea is similar to the one in Case 1: we show 
first that the expected number of copies of C aa ,o 
and Co,6 is not constant in the critical region, 
and use Friedgut's argument for So- The dele- 
tion process is almost identical to the one of the 
previous section, except that, in order to avoid 
creating "all negative" clauses of length greater 
than 1, we do not delete the last positive literal, 
in a clause with less than 6 positive literals, but 
a random negative literal. 

Case 3. 

Assume that we are not into either Case 1 or 
Case 2 because of the similar inequality for a . 
In this case we want to show that SAT(S) has 
a coarse threshold, occurring for p such that the 
expected number of copies of Cb,& is a constant 
c. We have already seen that the probability 
that a random formula $ is satisfiable is lower 
bounded by the probability that it contains no 
copies of Co,b a - So we only need to argue that 
the satisfaction probability is strictly less than 
1 , for some high enough value of the constant in 
the definition of p. 

The main ingredient of this proof, presented in 
full in the final version of the paper, is the claim 
that resolution will create the empty clause (thus 
certifying that the formula is unsatisfiable) with 
probability bounded away from 0. This is easy 
to sec if a > b and 6>! > b : consider first 
the set of all variables that appear in a copy of 
Cofio m (the number of such clauses has a Pois- 
son distribution). The variables in these clauses 
are different with probability 1 — o(l). A sat- 
isfying assignment (if it exists) must satisfy at 
least one such variable from each clause. Choose 



one variable from each such clause (there are, 
on the average, a constant number of ways to 
do this) and replace each clause by the positive 
unit clause consisting of the chosen variable. If 
the original formula was satisfiable then the new 
one is too, for at least one choice, corresponding 
to a satisfying assignment. 

Let us consider the clauses of type Ca.bx (with 
a > 1 minimal) whose negative literals involve 
chosen variables only, and whose positive literals 
do not appear in the copies of Co,& - When the 
number of copies of C'o,b is a t least a (which hap- 
pens with probability bounded away from zero) 
resolution, applied to the new formula, will cre- 
ate a number of copies of Co,b 1 with average 
Q(n) (since b>\ > b ). W.h.p. the number 
of such clauses is close to its expected value. 
Consider now the new clauses of type Co.b >1 
together with the initial clauses of type C ao , - 
With probability 1 — o(l) (if the constant in 
the 9(1) factor in p is large enough) this for- 
mula is unsatisfiable. Thus resolution will suc- 
ceed with probability bounded away from zero. 
A similar argument (but working with both pos- 
itive and negative variables) works for the case 
a = b < min{a>i, b>i}. 

The only other remaining case is 6>i — bo < &o- 
Its analysis is slightly more involved, but relies 
on the same idea: we create a linear number of 
copies of Cofio by resolving all negative literals 
from copies of C a! fc >1 . The number of copies of 
Co,b at each phase is stochastically larger than 
the number of customers in a queuing chain with 
more clients arriving at each stage than those 
that are served, hence with constant probability 
it becomes linear. Moreover, since only a of the 
chosen literals can appear negatively, the growth 
is substantially faster than the one of the corre- 
sponding queuing chain, in particular the num- 
ber of copies of Cofio becomes liniar after at most 
n°W iterations of the process. In this case the 
resulting formula is also unsatisfiable with prob- 
ability 1 — o(l). So the conclusion is the same, 
that the satisfaction probability of a random for- 
mula is (for large enough c) strictly less than one. 



□ References 



5 Conclusions 

We have investigated the connection between worst- 
case complexity and the existence of phase transi- 
tions. Our result shows that some 

connection between the two concepts exists after 
all: while it is not as clean as the one hoped for in ||, 
the lack of a phase transition has significant compu- 
tational implications: such problems are either com- 
putationally tractable, or well-predicted by a single, 
trivial algorithm. 

Several open problems remain: a first one is to ex- 
tend our result to the whole class of generalized sat- 
isfiability problems. We believe that obtaining such 
a characterization is interesting even though the mo- 
tivating conjecture isn't true. Another question is 
whether we can extend apply our techniques to con- 
straint programming problems (i.e satisfiability over 
non-binary domains). Obtaining a complete version 
of Schaefer's dichotomy theorem in this case is still 
open; however we believe that some of our results 
should carry over. 

A third, perhaps the most interesting, open ques- 
tion is to elucidate the connection between compu- 
tational complexity and the "physical" concept of 
first-order phase transition. As we have mentioned, 
the class of problems with such phase transitions is a 
subset of the class of problems with sharp thresholds. 
For clausal generalized satisfiability problems the in- 
clusion is strict: Bollobas et al. jl2| have shown that 
the phase transition in 2-SAT is of second-order. The 
proof can perhaps be adapted for any (nontrivial) 
clausal version of 2-SAT. It is tempting to conjecture 
that at least in the clausal case these are all such ex- 
amples. The non-clausal case is bound to be substan- 
tially more complex: work in progress |l5[ ] suggests 
that there exists a (non-clausal) NP-complete gener- 
alized satisfiability problem with the same width of 
the scaling window (and order of the phase transi- 
tion) as 2-SAT. Obtaining any further results is an 
interesting challenge. 
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Appendix 

Proposition 5.1 For every polynomial time degree 

V there exist monotone NP-decision problems A, B e 

V such that 

• A has a coarse threshold. 

• B has a sharp threshold. 

Proof sketch: Start with two problems C,D e P 
that have a coarse (sharp) threshold, for concreteness 
the property that a graph contains a triangle and 2- 
UNSAT, respectively). Let E e V. Encode E into 
a monotonically increasing set F such that E =^ F 
and n P (F) — > 1 as n — > oo for every p in the "critical 
region" of C. Define the set A to be the set C o F = 
{xy\x e C, y G F, \x\ — \y\}. It is easy too see that 
H P (A) = /U p (C)(l+o(l)), so A has a coarse threshold. 
Moreover A e T>. Set B is constructed in a similar 
fashion. □ 



